Skip to content

roachtest: add pg_dump round-trip compatibility test #167543

Merged
trunk-io[bot] merged 4 commits into
cockroachdb:masterfrom
rafiss:pgdump-roachtest
May 22, 2026
Merged

roachtest: add pg_dump round-trip compatibility test #167543
trunk-io[bot] merged 4 commits into
cockroachdb:masterfrom
rafiss:pgdump-roachtest

Conversation

@rafiss

@rafiss rafiss commented Apr 5, 2026

Copy link
Copy Markdown
Collaborator

Add a nightly roachtest that validates pg_dump round-trip compatibility
with CockroachDB. The test builds pg_dump from PostgreSQL source
(REL_18_3), creates a representative schema in a source database,
dumps it with pg_dump --schema-only using the pg_dump_compatibility
session variable, restores into a target database via psql, dumps
again, and compares the two outputs.

Known deviations are tracked via an expected-diff baseline file in
testdata, following the same pattern as the pg_regress roachtest.

The schema exercises a broad surface area:

  • Enums, basic column types, default expressions, comments, and
    multi-schema objects.
  • Secondary indexes (partial, covering, expression, GIN on JSONB and
    text arrays, hash-sharded, vector).
  • Foreign keys covering every ON DELETE/UPDATE action, multi-column
    FKs, self-referencing FKs, and cyclic FKs added via ALTER.
  • Sequences with various options and OWNED BY columns.
  • Views (including multi-level dependencies) and materialized views.
  • Identity columns (BY DEFAULT and ALWAYS), generated/computed
    columns, composite primary keys.
  • User-defined functions, stored procedures, triggers, composite
    types, and row-level security policies.
  • Type variations: collated text, INET, BIT, BIT VARYING, INTERVAL
    with precision, parameterized TIMESTAMPTZ, NUMERIC, CHAR, and
    enum arrays.
  • Identifier quoting edge cases: reserved-keyword names, mixed-case
    identifiers, and identifiers near the 63-byte PG limit.

In addition, the test:

  • Creates 3100 filler tables before the schema to push descriptor IDs
    into the range of PostgreSQL catalog OIDs (1247-6104), verifying
    that tableoid remapping in pg_dump_compatibility mode handles
    collisions with built-in catalog OIDs like pg_class (1259) and
    pg_type (1247).
  • Verifies the source dump contains expected patterns (CREATE TABLE,
    CREATE FUNCTION, COMMENT ON, etc.) to catch cases where pg_dump
    silently omits objects from both dumps -- a gap the round-trip diff
    comparison cannot detect. Patterns can be marked as known failures
    to log warnings without failing the test.
  • Extracts ERROR lines from psql's stderr into a restore_errors.txt
    artifact so restore failures are visible even when psql exits 0.

This PR also has 3 commits that provide minor compatibility fixes to make pg_dump work:

  • sql: include vector dimension in format_type output
  • sql/parser: accept pg_catalog-qualified collation names in COLLATE
  • sql: hide synthetic primary key metadata for materialized views

Resolves: #167442
Epic: CRDB-28751

Release note: None

@trunk-io

trunk-io Bot commented Apr 5, 2026

Copy link
Copy Markdown
Contributor

😎 Merged successfully - details.

@blathers-crl

blathers-crl Bot commented Apr 5, 2026

Copy link
Copy Markdown

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

It looks like your PR touches SQL parser code but doesn't add or edit parser tests. Please make sure you add or edit parser tests if you edit the parser.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity

Copy link
Copy Markdown
Member

This change is Reviewable

@rafiss rafiss force-pushed the pgdump-roachtest branch from 16384ef to a4f4f72 Compare May 7, 2026 14:39
@blathers-crl

blathers-crl Bot commented May 7, 2026

Copy link
Copy Markdown

Detected infrastructure failure (matched: self-hosted runner lost communication with the server). Automatically rerunning failed jobs. (run link)

@rafiss rafiss force-pushed the pgdump-roachtest branch from a4f4f72 to fa40329 Compare May 14, 2026 21:56
@rafiss rafiss marked this pull request as ready for review May 14, 2026 21:57
@rafiss rafiss requested review from a team as code owners May 14, 2026 21:57
@rafiss rafiss requested review from a team, golgeek, herkolategan, michae2, mw5h, spilchen and srosenberg and removed request for a team, golgeek, herkolategan and michae2 May 14, 2026 21:57

@spilchen spilchen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice test!

@spilchen reviewed 11 files and all commit messages, and made 5 comments.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on golgeek, mw5h, rafiss, and srosenberg).


pkg/cmd/roachtest/tests/pg_dump.go line 451 at r1 (raw file):

		}
	}
	fillerDB.Close()

nit: any reason we can't use defer fillerDB.Close()?


pkg/cmd/roachtest/tests/pg_dump.go line 36 at r4 (raw file):

// that pg_dump needs to handle correctly when dumping a CockroachDB
// database.
var pgDumpTestSchema = `

I don't see DOMAIN in this schema. Unless it's trivial, I don't think we should add one here, as this PR is already pretty good. But we may want to add to the DOMAIN epic to track. I can add one if you agree.

In general this is the thing we should update as we add new pg compatible features. Do you have any ideas of how we can remember to update this? Maybe an update to pkg/sql/CLAUDE.md or pkg/sql/schemachanger/CLAUDE.md (less certain about that one)?


pkg/cmd/roachtest/tests/pg_dump.go line 704 at r4 (raw file):

		)
	}
	if err != nil {

is this err check meant as a catch for any error found when parsing the restore errors file above? If so, I don't see how err is updated.


pkg/sql/pg_catalog.go line 2505 at r4 (raw file):

}

// addRowForPgIndex generates a row for pg_index for a given table and index.

nit: this comment seems out of place how. Can we move it down to be just above addRowForPgIndex

@rafiss rafiss left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rafiss made 4 comments and resolved 2 discussions.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on golgeek, mw5h, spilchen, and srosenberg).


pkg/cmd/roachtest/tests/pg_dump.go line 451 at r1 (raw file):

Previously, spilchen wrote…

nit: any reason we can't use defer fillerDB.Close()?

done


pkg/cmd/roachtest/tests/pg_dump.go line 36 at r4 (raw file):

Previously, spilchen wrote…

I don't see DOMAIN in this schema. Unless it's trivial, I don't think we should add one here, as this PR is already pretty good. But we may want to add to the DOMAIN epic to track. I can add one if you agree.

In general this is the thing we should update as we add new pg compatible features. Do you have any ideas of how we can remember to update this? Maybe an update to pkg/sql/CLAUDE.md or pkg/sql/schemachanger/CLAUDE.md (less certain about that one)?

Sounds good. let's keep this PR focused. please go ahead and add a separate issue for DOMAIN. Feel free to assign it to me too; i'd like to make sure that works as well

for the reminder/documentation, I added a section to pkg/sql/schemachanger/CLAUDE.md since most new pg-compatible schema features land in the declarative schema changer


pkg/cmd/roachtest/tests/pg_dump.go line 704 at r4 (raw file):

Previously, spilchen wrote…

is this err check meant as a catch for any error found when parsing the restore errors file above? If so, I don't see how err is updated.

done; i improved the comment to say that this is meant for non-fatal psql errors


pkg/sql/pg_catalog.go line 2505 at r4 (raw file):

Previously, spilchen wrote…

nit: this comment seems out of place how. Can we move it down to be just above addRowForPgIndex

done

rafiss and others added 2 commits May 22, 2026 14:05
Add a weekly roachtest that validates pg_dump round-trip compatibility
with CockroachDB. The test builds pg_dump from PostgreSQL source
(REL_18_3), creates a representative schema in a source database,
dumps it with `pg_dump --schema-only` using the `pg_dump_compatibility`
session variable, restores into a target database via psql, dumps
again, and compares the two outputs.

Known deviations are tracked via an expected-diff baseline file in
testdata, following the same pattern as the pg_regress roachtest.

The schema exercises a broad surface area:

- Enums, basic column types, default expressions, comments, and
  multi-schema objects.
- Secondary indexes (partial, covering, expression, GIN on JSONB and
  text arrays, hash-sharded, vector).
- Foreign keys covering every ON DELETE/UPDATE action, multi-column
  FKs, self-referencing FKs, and cyclic FKs added via ALTER.
- Sequences with various options and OWNED BY columns.
- Views (including multi-level dependencies) and materialized views.
- Identity columns (BY DEFAULT and ALWAYS), generated/computed
  columns, composite primary keys.
- User-defined functions, stored procedures, triggers, composite
  types, and row-level security policies.
- Type variations: collated text, INET, BIT, BIT VARYING, INTERVAL
  with precision, parameterized TIMESTAMPTZ, NUMERIC, CHAR, and
  enum arrays.
- Identifier quoting edge cases: reserved-keyword names, mixed-case
  identifiers, and identifiers near the 63-byte PG limit.

In addition, the test:

- Creates 3100 filler tables before the schema to push descriptor IDs
  into the range of PostgreSQL catalog OIDs (1247-6104), verifying
  that tableoid remapping in pg_dump_compatibility mode handles
  collisions with built-in catalog OIDs like pg_class (1259) and
  pg_type (1247).
- Verifies the source dump contains expected patterns (CREATE TABLE,
  CREATE FUNCTION, COMMENT ON, etc.) to catch cases where pg_dump
  silently omits objects from both dumps -- a gap the round-trip diff
  comparison cannot detect. Patterns can be marked as known failures
  to log warnings without failing the test.
- Extracts ERROR lines from psql's stderr into a restore_errors.txt
  artifact so restore failures are visible even when psql exits 0.

Resolves: cockroachdb#167442
Epic: CRDB-28751

Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
format_type now emits `vector(N)` when atttypmod is set, instead of
dropping the dimension and returning bare `vector`. pgvector stores
the dimension directly in typmod (no -4 header offset).

Epic: CRDB-62553

Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
rafiss and others added 2 commits May 22, 2026 14:05
PostgreSQL keeps every collation in the pg_catalog namespace and emits
COLLATE clauses qualified that way. The grammar now accepts the
qualified form (only when the schema part is exactly pg_catalog),
strips the qualifier, and validates the trailing locale as before.
Other schemas are rejected to match PostgreSQL's "collation does not
exist" semantics.

Epic: CRDB-62553

Release note: None

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
CockroachDB attaches a synthetic primary key on the hidden rowid column
to every materialized view, but PostgreSQL matviews carry no constraint
or index of their own. Consumers that walk pg_catalog (e.g. pg_dump)
saw the synthetic PK as both a pg_constraint row and a pg_index entry
and tried to recreate it.

Skip pg_constraint rows entirely for materialized views (matching PG's
"matviews have no constraints" behavior), and skip primary-key entries
in pg_index / pg_indexes for matviews unconditionally. While here, also
apply the existing pg_constraint hidden-PK filter to pg_index for
ordinary tables (gated on show_primary_key_constraint_on_not_visible_columns)
so the two views are consistent about what they expose.

Epic: CRDB-62553

Release note (sql change): pg_catalog no longer shows the synthetic
primary key constraint or its backing index for materialized views.
This matches PostgreSQL, which never attaches a constraint or index to
a matview implicitly.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@rafiss rafiss force-pushed the pgdump-roachtest branch from fa40329 to 080ea6e Compare May 22, 2026 18:05
@rafiss rafiss requested review from a team as code owners May 22, 2026 18:05
@rafiss rafiss requested review from shghasemi and spilchen and removed request for a team May 22, 2026 18:05

@spilchen spilchen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@spilchen reviewed 2 files, made 2 comments, and resolved 1 discussion.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on golgeek, mw5h, rafiss, shghasemi, and srosenberg).


pkg/cmd/roachtest/tests/pg_dump.go line 36 at r4 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

Sounds good. let's keep this PR focused. please go ahead and add a separate issue for DOMAIN. Feel free to assign it to me too; i'd like to make sure that works as well

for the reminder/documentation, I added a section to pkg/sql/schemachanger/CLAUDE.md since most new pg-compatible schema features land in the declarative schema changer

I added #170830. Thanks.

@rafiss

rafiss commented May 22, 2026

Copy link
Copy Markdown
Collaborator Author

/trunk merge

TFTR!

@trunk-io trunk-io Bot merged commit 49763f0 into cockroachdb:master May 22, 2026
26 checks passed
@rafiss rafiss deleted the pgdump-roachtest branch May 25, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: add nightly pg_dump compatibility test using PostgreSQL's test suite

3 participants